221 research outputs found

    EEG Searchlight Decoding Reveals Person- and Place-specific Responses for Semantic Category and Familiarity.

    Get PDF
    Proper names are linguistic expressions referring to unique entities, such as individual people or places. This sets them apart from other words like common nouns, which refer to generic concepts. And yet, despite both being individual entities, one's closest friend and one's favorite city are intuitively associated with very different pieces of knowledge-face, voice, social relationship, autobiographical experiences for the former, and mostly visual and spatial information for the latter. Neuroimaging research has revealed the existence of both domain-general and domain-specific brain correlates of semantic processing of individual entities; however, it remains unclear how such commonalities and similarities operate over a fine-grained temporal scale. In this work, we tackle this question using EEG and multivariate (time-resolved and searchlight) decoding analyses. We look at when and where we can accurately decode the semantic category of a proper name and whether we can find person- or place-specific effects of familiarity, which is a modality-independent dimension and therefore avoids sensorimotor differences inherent among the two categories. Semantic category can be decoded in a time window and with spatial localization typically associated with lexical semantic processing. Regarding familiarity, our results reveal that it is easier to distinguish patterns of familiarity-related evoked activity for people, as opposed to places, in both early and late time windows. Second, we discover that within the early responses, both domain-general (left posterior-lateral) and domain-specific (right fronto-temporal, only for people) neural patterns can be individuated, suggesting the existence of person-specific processes

    AraNLP: A Java-based library for the processing of Arabic text

    Get PDF
    We present a free, Java-based library named "AraNLP" that covers various Arabic text preprocessing tools. Although a good number of tools for processing Arabic text already exist, integration and compatibility problems continually occur. AraNLP is an attempt to gather most of the vital Arabic text preprocessing tools into one library that can be accessed easily by integrating or accurately adapting existing tools and by developing new ones when required. The library includes a sentence detector, tokenizer, light stemmer, root stemmer, part-of-speech tagger (POS-tagger), word segmenter, normalizer, and a punctuation and diacritic remover

    A semi-supervised learning approach to arabic named entity recognition

    Get PDF
    We present ASemiNER, a semi-supervised algorithm for identifying Named Entities (NEs) in Arabic text. ASemiNER does not require annotated training data, or gazetteers. It also can be easily adapted to handle more than the three standard NE types (Person, Location, and Organisation). To our knowledge, our algorithm is the first study that intensively investigates the semi-supervised pattern-based learning approach to Arabic Named Entity Recognition (NER). We describe ASemiNER and compare its performance with different supervised systems. We evaluate this algorithm by way of experiments to extract the three standard named-entity types. Ultimately, our algorithm outperforms simple supervised systems and also performs well when we evaluate its performance in order to extract three new, specialised types of NEs (Politicians, Sportspersons, and Artists)

    Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia

    Get PDF
    In this paper we propose a new methodology to exploit Wikipedia features and structure to automatically develop an Arabic NE annotated corpus. Each Wikipedia link is transformed into an NE type of the target article in order to produce the NE annotation. Other Wikipedia features - namely redirects, anchor texts, and inter-language links - are used to tag additional NEs, which appear without links in Wikipedia texts. Furthermore, we have developed a filtering algorithm to eliminate ambiguity when tagging candidate NEs. Herein we also introduce a mechanism based on the high coverage of Wikipedia in order to address two challenges particular to tagging NEs in Arabic text: rich morphology and the absence of capitalisation. The corpus created with our new method (WDC) has been used to train an NE tagger which has been tested on different domains. Judging by the results, an NE tagger trained on WDC can compete with those trained on manually annotated corpora

    A Model for Automatic Extraction of Slowdowns From Traffic Sensor Data

    Get PDF
    The ability to identify slowdowns from a stream of traffic sensor readings in an automatic fashion is a core building block for any application which incorporates traffic behaviour into its analysis process. The methods proposed in this paper treat slowdowns as valley-shaped data sequences that are found below a normal distribution interval. This paper proposes a model for slowdown identification and partitioning across multiple periods of time and it aims to serve as a first layer of knowledge about the traffic environment. The model can be used to extract the regularities from a set of events of interest with recurring behaviour and to assert the consistency of the extracted patterns. The proposed methods are evaluated using real data collected from highway traffic sensor

    Cross-participant modelling based on joint or disjoint feature selection: an fMRI conceptual decoding study

    Get PDF
    Multivariate classification techniques have proven to be powerful tools for distinguishing experimental conditions in single sessions of functional magnetic resonance imaging (fMRI) data. But they are vulnerable to a considerable penalty in classification accuracy when applied across sessions or participants, calling into question the degree to which fine-grained encodings are shared across subjects. Here, we introduce joint learning techniques, where feature selection is carried out using a held-out subset of a target dataset, before training a linear classifier on a source dataset. Single trials of functional MRI data from a covert property generation task are classified with regularized regression techniques to predict the semantic class of stimuli. With our selection techniques (joint ranking feature selection (JRFS) and disjoint feature selection (DJFS)), classification performance during cross-session prediction improved greatly, relative to feature selection on the source session data only. Compared with JRFS, DJFS showed significant improvements for cross-participant classification. And when using a groupwise training, DJFS approached the accuracies seen for prediction across different sessions from the same participant. Comparing several feature selection strategies, we found that a simple univariate ANOVA selection technique or a minimal searchlight (one voxel in size) is appropriate, compared with larger searchlights

    The effect of linguistic and visual salience in visual world studies

    Get PDF
    Research using the visual world paradigm has demonstrated that visual input has a rapid effect on language interpretation tasks such as reference resolution and, conversely, that linguistic material-including verbs, prepositions and adjectives-can influence fixations to potential referents. More recent research has started to explore how this effect of linguistic input on fixations is mediated by properties of the visual stimulus, in particular by visual salience. In the present study we further explored the role of salience in the visual world paradigm manipulating language-driven salience and visual salience. Specifically, we tested how linguistic salience (i.e., the greater accessibility of linguistically introduced entities) and visual salience (bottom-up attention grabbing visual aspects) interact. We recorded participants' eye-movements during a MapTask, asking them to look from landmark to landmark displayed upon a map while hearing direction-giving instructions. The landmarks were of comparable size and color, except in the Visual Salience condition, in which one landmark had been made more visually salient. In the Linguistic Salience conditions, the instructions included references to an object not on the map. Response times and fixations were recorded. Visual Salience influenced the time course of fixations at both the beginning and the end of the trial but did not show a significant effect on response times. Linguistic Salience reduced response times and increased fixations to landmarks when they were associated to a Linguistic Salient entity not present itself on the map. When the target landmark was both visually and linguistically salient, it was fixated longer, but fixations were quicker when the target item was linguistically salient only. Our results suggest that the two types of salience work in parallel and that linguistic salience affects fixations even when the entity is not visually present. © 2014 Cavicchio, Melcher and Poesio

    Discourse Structure and Anaphora: An Empirical Study

    Get PDF
    ne of the main motivations for studying discourse structure is its effect on the search for the antecedents of anaphoric expressions. We tested the predictions in this regard of theories assuming that the structure of a discourse depends on its intentional structure, such as Grosz and Sidner?s theory. We used a corpus of tutorial dialogues independently annotated according to Relational Discourse Analysis (RDA), a theory of discourse structure merging ideas from Grosz and Sidner?s theory with proposals from Rhetorical Structure Theory (RST). Using as our metrics the accessibility of anaphoric antecedents and the reduction in ambiguity brought about by a particular theory, we found support for Moser and Moore?s proposal that among the units of discourse assumed by an RST-like theory, only those expressing an intentional ?core? (in the RDA sense) should be viewed as constraining the search for antecedents; units only expressing informational relations should not introduce separate focus spaces. We also found that the best compromise between accessibility and ambiguity (?perplexity?) reduction is a model in which the focus spaces associated with embedded cores and embedded contributors remain on the stack until the RDA-segment in which they occur is completed, and discuss the implications of this finding for a stack-based theory

    Scoring Coreference Chains with Split-Antecedent Anaphors

    Get PDF
    Anaphoric reference is an aspect of language interpretation covering a variety of types of interpretation beyond the simple case of identity reference to entities introduced via nominal expressions covered by the traditional coreference task in its most recent incarnation in ONTONOTES and similar datasets. One of these cases that go beyond simple coreference is anaphoric reference to entities that must be added to the discourse model via accommodation, and in particular split-antecedent references to entities constructed out of multiple discourse entities, as in split-antecedent plurals and in some cases of discourse deixis. Although this type of anaphoric reference is now annotated in many datasets, systems interpreting such references cannot be evaluated using the Reference coreference scorer (Pradhan et al., 2014). As part of the work towards a new scorer for anaphoric reference able to evaluate all aspects of anaphoric interpretation in the coverage of the Universal Anaphora initiative, we propose in this paper a solution to the technical problem of generalizing existing metrics for identity anaphora so that they can also be used to score cases of split-antecedents. This is the first such proposal in the literature on anaphora or coreference, and has been successfully used to score both split-antecedent plural references and discourse deixis in the recent CODI/CRAC anaphora resolution in dialogue shared tasks
    corecore